========================================================

Introduction

Though tempted by the subject and predicted analytical brevity of the wine datasets, I was curious about data related to 2016 election, which most mainstream analyses got so wrong. (Though it could be argued, as Raza Irizarry does here that if you look at confidence intervals, Nate Silver actually did get it right.)

The FEC offers data on campaign contributions by state here . I decided to look at Florida’s because of its swing state status, interesting mix of demographics, and relatively large data set.

Univariate Plots

## [1] 359419     18
##  [1] "cmte_id"           "cand_id"           "cand_nm"          
##  [4] "contbr_nm"         "contbr_city"       "contbr_st"        
##  [7] "contbr_zip"        "contbr_employer"   "contbr_occupation"
## [10] "contb_receipt_amt" "contb_receipt_dt"  "receipt_desc"     
## [13] "memo_cd"           "memo_text"         "form_tp"          
## [16] "file_num"          "tran_id"           "election_tp"
## 'data.frame':    359419 obs. of  18 variables:
##  $ cmte_id          : Factor w/ 25 levels "C00458844","C00500587",..: 6 7 6 4 6 6 7 7 6 6 ...
##  $ cand_id          : Factor w/ 25 levels "P00003392","P20002671",..: 1 12 1 10 1 1 12 12 1 1 ...
##  $ cand_nm          : Factor w/ 25 levels "Bush, Jeb","Carson, Benjamin S.",..: 4 20 4 5 4 4 20 20 4 4 ...
##  $ contbr_nm        : Factor w/ 92432 levels "'CALLAHAN, THOMAS",..: 73898 47777 14254 41232 59556 3152 43861 43861 72543 48041 ...
##  $ contbr_city      : Factor w/ 1470 levels "","'CALLAHAN",..: 1043 465 1317 1056 1450 790 379 379 134 791 ...
##  $ contbr_st        : Factor w/ 1 level "FL": 1 1 1 1 1 1 1 1 1 1 ...
##  $ contbr_zip       : num  3.33e+08 3.20e+08 3.23e+08 3.31e+08 3.39e+08 ...
##  $ contbr_employer  : Factor w/ 25430 levels "","-","--","-----",..: 22221 16068 21806 22650 15726 15551 20032 20032 15551 15551 ...
##  $ contbr_occupation: Factor w/ 10591 levels "","-","--"," RETIRED AGRICULTURE INSPECTOR",..: 2240 6229 3768 2165 197 8150 827 827 8150 8150 ...
##  $ contb_receipt_amt: num  15 50 100 100 5 ...
##  $ contb_receipt_dt : Factor w/ 650 levels "1-Apr-15","1-Apr-16",..: 298 557 567 401 2 257 557 578 567 173 ...
##  $ receipt_desc     : Factor w/ 46 levels ""," SEE REATTRIBUTION",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ memo_cd          : Factor w/ 2 levels "","X": 2 1 2 1 2 2 1 1 2 2 ...
##  $ memo_text        : Factor w/ 204 levels ""," SEE REATTRIBUTION",..: 21 11 21 1 21 21 11 11 21 21 ...
##  $ form_tp          : Factor w/ 3 levels "SA17A","SA18",..: 2 1 2 1 2 2 1 1 2 2 ...
##  $ file_num         : int  1091718 1077404 1091718 1077664 1091718 1091718 1077404 1077404 1091718 1091718 ...
##  $ tran_id          : Factor w/ 357787 levels "A001647F906B94BC7AAD",..: 89413 305582 88708 221584 88508 89363 305742 306332 92809 89207 ...
##  $ election_tp      : Factor w/ 4 levels "","G2016","P2016",..: 3 3 3 3 3 3 3 3 3 3 ...

There is a good explanation of the data here The one major category I found missing was the political party of the candidate, so I went about adding that feature first.

##                 Bush, Jeb       Carson, Benjamin S. 
##                      6045                     16072 
##  Christie, Christopher J.   Clinton, Hillary Rodham 
##                       224                    146719 
## Cruz, Rafael Edward 'Ted'            Fiorina, Carly 
##                     29153                      2054 
##      Gilmore, James S III        Graham, Lindsey O. 
##                         1                       184 
##            Huckabee, Mike             Jindal, Bobby 
##                       418                        23 
##             Johnson, Gary           Kasich, John R. 
##                       757                      1342 
##          Lessig, Lawrence            McMullin, Evan 
##                        41                        51 
##   O'Malley, Martin Joseph         Pataki, George E. 
##                       152                        19 
##                Paul, Rand    Perry, James R. (Rick) 
##                      2024                        44 
##              Rubio, Marco          Sanders, Bernard 
##                     20472                     82527 
##      Santorum, Richard J.               Stein, Jill 
##                        85                       353 
##          Trump, Donald J.             Walker, Scott 
##                     50217                       410 
##     Webb, James Henry Jr. 
##                        32
##  [1] Clinton, Hillary Rodham   Sanders, Bernard         
##  [3] Cruz, Rafael Edward 'Ted' Walker, Scott            
##  [5] Bush, Jeb                 Rubio, Marco             
##  [7] Kasich, John R.           Christie, Christopher J. 
##  [9] Johnson, Gary             Trump, Donald J.         
## [11] Paul, Rand                Webb, James Henry Jr.    
## [13] Graham, Lindsey O.        Carson, Benjamin S.      
## [15] Santorum, Richard J.      Fiorina, Carly           
## [17] Jindal, Bobby             Huckabee, Mike           
## [19] O'Malley, Martin Joseph   Pataki, George E.        
## [21] Stein, Jill               Gilmore, James S III     
## [23] McMullin, Evan            Lessig, Lawrence         
## [25] Perry, James R. (Rick)   
## 25 Levels: Bush, Jeb Carson, Benjamin S. ... Webb, James Henry Jr.
##    democrat       green independent libertarian  republican 
##      229471         353          51         757      128787

There were nearly twice as many contributions towards Democratic candidates as Republicans, of course this has nothing to say about amounts. Third party contributions trail far behind.

Because of the large disparity in contributions between the least and most popular candidates, a log transform of y-axis is appropriate.

Presumably the x-axis has been plotted this way because there are large positive and negative amounts. Let’s check those out more closely.

## [1] REDESIGNATION TO GENERAL REDESIGNATION TO GENERAL
## [3] REDESIGNATION TO GENERAL REDESIGNATION TO GENERAL
## [5] REDESIGNATION TO GENERAL Refund                  
## 46 Levels:  ... SEE REDESIGNATION
##          cmte_id   cand_id                 cand_nm     contbr_nm
## 257192 C00575795 P00003392 Clinton, Hillary Rodham GOCKE, THOMAS
##        contbr_city contbr_st contbr_zip contbr_employer contbr_occupation
## 257192     JUPITER        FL  334691584   SELF-EMPLOYED         PHYSICIAN
##          amt contb_receipt_dt receipt_desc memo_cd           memo_text
## 257192 20000        29-Jun-15                      REFUNDED ON 6/30/15
##        form_tp file_num tran_id election_tp       date   zip    party
## 257192   SA17A  1024052 C330478       P2016 2015-06-29 33469 democrat

It appears that these negative numbers are in fact legitimiate and are either refunds or redesignation to the general funds. There’s also one very interesting data point: a 20,000 dollar contribution and subsequent refund the following day from a self-employed physician in Jupiter to Hillary Clinton. Let’s try the boxplots again just looking within the 1 - 99% of data.

When addng a log10 transformation to the x-axis there is a relatively normal distribution, though there are spikes at regular intervals like 10, 25, 50, 100, 250, 500, 1000, and 2500, because people are more like to contirbute 100 dollars than say, 88.

## [1] 42066

## 
## LINGOR, MARGARET      SAIFF, IVAN   BAILEY, ANGELA    CAMP, PREMA J 
##              261              252              237              214 
## STANLEY, ANTHONY   BONNEMAN, JACK 
##              204              197

There were quite a few people (42,066) who contributed more than once, and even quite a few who contributed more than 150 times, though the maximum number is 261 times.

As expected, the city with the highest number of contributions was Miami, followed by other large urban centers such as Tampa, Orlando, and Naples. There were some unknown to me cities such as Boynton Beach, The Villages, and Vero Beach that also had a large number of contributions.

Though this is venturing into the bivariate, bear me with me. From the first plot, we were aware that there were more Democratic contributions than Republican. However, a few zipcodes stand out when we compare beyond this trend. The bars are ordered from least number of overall contributions to highest number of overall contributions. The following zip codes stand out because they have a particularly high number of Republican to Democratic contributions:

The following stand out because they have relatively equal ratios of R to D:

And these stand out because they have much higher numbers of D to R:

While there are already interesting trends revealing themselves here, the main purpose of this is to show that most of the highest contributing zip codes also have counterparts as cities. For that reason they are redundant and I will only analyze city from here on, since it is more easily understandable on first sight. If I were building a regression model, zip codes would come more in handy since they are numerical.

While it will be very interesting to see how these breakdown across party or candidate, for now we know that the top occupations are retired, not-employed, and people who did not specify what their employment was.

## 
##                          RETIRED                              N/A 
##                            72010                            52564 
##                    SELF-EMPLOYED                             NONE 
##                            31620                            24684 
##            INFORMATION REQUESTED                     NOT EMPLOYED 
##                            20996                            12379 
##                             SELF                        HOMEMAKER 
##                             5910                             2570 
##            UNIVERSITY OF FLORIDA                 STATE OF FLORIDA 
##                             1064                              807 
##         FLORIDA STATE UNIVERSITY              UNIVERSITY OF MIAMI 
##                              711                              706 
## FLORIDA INTERNATIONAL UNIVERSITY      FLORIDA ATLANTIC UNIVERSITY 
##                              634                              524 
##                AMERICAN AIRLINES 
##                              475

Because the top contributors are retired, not employed or self-employed, using employer information is redundant and will not be further considered. It is interesting to note that the largest employers are universities.

Clearly there were more contributions during the primaries than during the general election, though there was one contribution towards 2020, and several unmarked contributions. Perhaps this commentary on the electorate’s enthutiasm about about the final candidates.

As my final univariate plot, I look at a histogram of date. Though more contributions were made to primary races, contributions did not pick up until April of 2015, peaking in July and August of 2016. Using a log10 transform on a numeric version of the data did not appear to significatly alter the the highly left-skewed data.

Univariate Analysis

Dataset structure

This data set includes 359419 contributions (or contribution refunds) towards presidential campaigns in the 2016 election cycle in Florida. Each observation includes the following 18 features: committee id, candidate id, candidate name, contributor name, contributor state (in this case all = FL), contributor zip code, contributor city, contributor employer, contributor occupation, contribution receipt amount, contribution receipt date, receipt description, memo code, memo text, form type, file number, transaction id, and election type.

Of these, I only considered or will consider candidate name (synonymous with candidate id), contributor name, contributor city, contributor occupation, contribution receipt amount, receipt description, date and party.

There is a more detailed explanation of the data available here .

Other observations: - There are some negative values because of refunds or because a contribution was reattributed to a spouse or redesignated for the general election rather than the primaries. (This was discovered by reading the memo line of negative contributions.) - Far more contrbutions went towards the primary: 265962 as compared to 92280 towards the general election. (And one towards the 2020 election.) This explains why there are contributions to 25 candidates rather than just a handful. - Despite this, most contributions were made in 2016 rather than earlier on.

What is/are the main feature(s) of interest in your dataset?

I am interested in the following questions:

  • Which candidates received the highest number of contributions?
  • What was the distribution of contribution amount for each candidate?
  • How did location and occupation affect which candidate or party the contribution went to?

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

Digging deeper I would like to look at the following:

  • How did contributions change over time in regards to whom received the contributions? I would suspect that at first contributions were widely spread, but then came to a narrow focus after the convetions took place, when third party candidates got a boost.
  • How did the dollar amount of contributions change over time?
  • Were there people who contributed multiple times? How did their contributions change over time?

Did you create any new variables from existing variables in the dataset?

So far I have created the political party variable. I plan on also finding the mean and median amount of contribution for each candidate and for each day of the election cycle. ### Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

There were a couple of unusual distributions. First off, as previously stated, I was surprised to find negative contribution amounts. However, they were legitimate and I decided to keep them in order to more accurately calculate total and therefore mean contributions. Additionally I changed the receipt_dt into date format for better time series plots. Finally I standardized zip codes, as some where in the 9 digit format. Some were also not zipcodes as they had less than 5 digits or were out of state zip codes. According to reference all Florida zip codes are between [32000, 35000]. I renamed lengthy variable names, and consolidated common values( e.g. “self-employed” and “self employed”).Also, when plotting histograms of the number of contributions or the contribution amount I used the log10 tranformation in order to better visualize the relative distribution.

Bivariate Plots Section

Since most of the data is categorical, there i only one correlation coefficient (between amout and date) that is calculated. It is negative showing that amounts decreased over time, but very weak (-.198) I will explore much of the bivariate data using boxplots and barcharts.

Democratic contributions, especially below 100 dollars, were much more common while larger contributions were more popular for Republican contributors.

Because there were so many Democratic and Republican contributions, there were many outliers in those boxplots. The medians from lowest to highest were Democrat, Green, Republican and Libertarian, Independent. Democrats had the smallest interquartile range but the most outliers.

When comparing all parties’ contribution amounts, transfomring both x and y axis by log 10 allows us to see the relatively normal distribution of all. However, it clear from the first plots that there were more Democratic contributions than other parties for amounts less than $100 dollars, but that after that point, the number of Republican contributions is equal to or greater than the amount for democrats. (As shown in the previous amount histograms.)

Adding a jitter to these boxplots may help us see how densely packed they are.

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -20000.0     10.0     25.0    101.2     50.0  20000.0

From this graph it’s clear that Clinton and Sanders received far more contributions than the other democratic candidates, and this helps to explain the number of outliers for these two candiates. It also appears they are the only ones with significant negative “contributions”. O’Malley had a higher 3rd quartile and median than his other two lesser known competitors. Let’s take a look at just Clinton’s and Sander’s boxplots.

So it’s clear that Clinton received more contributions with higher amounts than Sanders. As can be seen, many of the most popular values are regular intervals of 100 or 1000, espeically 1000 and also 1500, 1750, and 2000 for Clinton.

Both Clinton and Sanders had a median contribution amount of 25 dollars, though Clinton’s 3rd quartile is 75 while Sanders is 50. Let’s take a look at the Republican candidate’s amount boxplots.

As in the democratic boxplots, there are regular intervals (such as 500 and 1000) that are more common. Interestingly, many candidates 3rd quantiles reach the 99% cut-off, including Bush, Christie, Jindal, and Pataki, meaning they had several large amount contributions.

The fact that the more popular candidates, Cruz, Carson, Rubio, and Trump do not have high 3rd quartiles does not mean that they did not receive such contributions, just that they have many more contributions that fall in a lower range and therefore large donation are deemed outliers. Let’stake a look at it without the jitter.

This plot actually shows things a bit more clearly than the boxplots with jitter. Let’s focus in on the four republican contenders with the most number of contributions.

Interestingly, though Trump boasted about how most of his contributions were in small amounts, his median is higher than Cruz’s and Carson’s. Rubio has the highest median and largest IQR. How about the third party candidates?

Without limiting the data we can see that Johnson had many more outliers beyond the third quartile than the other two, while Stein had a few and one distinct outlier. McMullin had only one, but had highest median of the three.

How about trying a violinplot to see the distribution for the major candidates?

Nothing out of the ordinary here – the distributions seem relatively normal for each, however that distribution is widest for Clinton (much in thanks to our self-employed physician from Jupiter) and narrowest for Trump.

It makes sense that Trump and Clinton got the largest number of contributions for their party since they ended up being the party nominations.

While the disparities here are interesting, it’s a bit difficult to read on this stacked bar chart, so I’ve decided to facet it out.

The interesting points here are Rubio’s popularity in Miami as compared to his Republican contributors and that Sanders received more contributions in Gainesville than Clinton, though the opposite is true in every other city.

Naples stands out as the only city whith more Republican contributions than Democratic, but it’s difficult to tell with the smaller bar. Let’s dig into the ratios a bit more.

We can see that there are relatively few cities were the ratio of Democratic contributions to Republican contributions is less than one. (I.e. there were more Republican contributions) These cities were Vero Beach, Naples, The Villages. On the other hand, Gaineseville had over 7x as many Democratic contributions and Miami Beach had over 5 times as many Democratic contributions. The degree to which a city is Democratic or Republican does not seem to correlate with the total number of contributions, which might somewhat approximate the size or population of the city.

Across parties, most contributions came from retired people. For Democrats, the occupations of the next top contributors were “not-employed”, “information requested” and attorney. For Republicans, not-employed was the lowest category, and for the third parties this category did not exist. The next highest contributors among Republicans where “Information Requested”, which was significantly higher than in Democrats and then attorney. Other notable Republican dearths when compared to Democrats were managers, teachers, and professors. Of the third party contributions, Libertarians had the widest spread of occupations followed by Greens and then Independents. However, there were zero professors for Libertarians, and it was one of the few occupations supporting Greens. Note that the y-axis has undergone a log transformation for the third parties but not the main parties so that it was easier to see which occupations had contributed.

The only difference between Trump’s distribution amoung occupations and the overall Republican distribution is that there were fewer contributions from homemakers and more from “information requested” for Trump. Far fewer self-employed and unemployed people contributed to Clinton as compared to Sanders.

Not surprisingly CEOs had the largest IQR and highest median while unmployed had the lowest median, followed by professor and teacher.

Though there were some contributions so early in the campaign, I’m going to zoom in on when most of the contributions came, namely starting April 2015.

This paints a very interesting picture indeed, though it is still too noisy. Though there are peaks in the dollar amount of contributions 2016, the daily medians are much higher earlier on, before the conventions. This needs to be smoothed out as there is currently one bin per day.

According to this to this, mean and median contribution dollar amounts decreased from April 2015 on, though the early univariate histogram showed that the number of contributions increased in the final few months of the election. Though the graphs have similar shapes, it is worth it to point out that the scales on the y-axes are quite different.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset? Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

Bivariate plots supported truisms regarding party and candidate affiliation. Contributions for Democratic candidates were likely to be smaller in amount and come from the unemployed, teachers, and professors. This was particularly true for Sanders as Clinton received more contributions of higher amounts. Contributions for Republicans tended to be of higher amount and were more popular among self-employed or business owners.

Interesting information about candidate and party support was shown across different cities, with Gainesville, a large university city showing the highest ratio of democrat to republican contributions, particularly for Sanders. Rubio had a relatively large following in Miami which makes sense since that is where he is from.

Though the univariate analyses showed that the number of contributions spiked in the summer of 2016, the final plot of the bivariate analysis showed that the dollar amount of the contributions generally declined from April 2015 on.

What was the strongest relationship you found?

Democrats receive higher numbers of contributions but of a lower dollar amount.

Multivariate Plots Section

While the number of contributions was much higher in the primaries, it is interesting to watch see how Republican contributions became left skewed towards higher dollar amounts in the general election.

Arranging these boxplots by medians shows there is no discernible relationship between median contribution amount in a city and its democratic to republican ratio.

In all occupations except for “Information Requested” Republican contributions have a higher dollar median. This is especially true of homemakers, attorneys, physicians, and CEOs.

People who contribute multiple times are far more likely to be contributing to Democratic campaigns and Republican multiple contributors had higher median amounts. Some of the multiple contributors have several negative values showing that they refund or reassign their contributions.

The median amounts for multiple Democratic contributors were all below 50 dollars, but other than that there was not much to unite them. Sanders contributors did not switch to Hilary after the primary.

Edwin Gray contributed to three separate candiates though he strong preferred Cruz. James Coffman only contributed to Cruz.

The abrupt shift towards the party candidate is apparent starting in June, otherwise there is too much noise to discern patterns.

As suspected, the majority of third party contributions came towards the end of the campaign. Of the third party contributions, Green had the earliest start. Interestingly while Democratic contributions stayed relatively steady in the last few months, with a small upswing in number and amt in August 2016, there were gaps in Republican contributions of lower amounts (<100) in June 2016 and after August 2016. The smoother shows the uptick in the Republican contribution amount towards the end.

Both general and primary contributions show a normal distribution with spikes at regular intervals, however there were many more contributions to the primaries, as previously shown. While there were more Democratic contributions overall, Republicans contributions were more left-skewed in the general election.

Though Clinton received the party nomination, the dollar amount of her contributions did not receive the jump that Trump’s did once he did. Republican contributions had a far higher variability than Democrat’s contributions.

This plot breaks down contributions towards candates across time, and faceted into amounts of less than 500, between 500 and 2500 and above 2500.

Here is a similar plot which uses only a sample of the data since with all of the data it is impossible to notice the different sizes of contributions.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

Again it was shown that contributions to Democratic candidates tended to be smaller in dollar amount, but more frequent. Republican contributions were particularly skewed towards larger amounts in the general election.

Were there any interesting or surprising interactions between features?

  • The Democratic to Republican Ratio did not effect city median dollar amount of contributions as I expected it to.

  • The median dollar amount in different occupations was highly split depending on party.

  • People who contributed multiple times were more likely to Democratic contributors rather than Republicans.

  • Smaller dollar amount contributions effectively stopped once Trump was announced as the party candidate.

OPTIONAL: Did you create any models with your dataset? Discuss the strengths and limitations of your model.

No I did not. Almost all of the features were non numerical, meaning I would have to codify those into numerical values in order to create a regression model. This could be interesting in order to create a model predicting which party or candidate a contribution will go towards based on the contribution amount, city, occupation and date.


Final Plots and Summary

Plot One

Description One

This plot shows the cities ranging from most Democratic to most Republican in terms of the ratio of contributions. In the top line graph, there is a horizontal line at y = 1 to show where perfectly equal contribution would be. Three cities fall below the line, while the rest are above, and in some cases the ratio is many times greater Democrat than Republican. Vero Beach, the most Republican has a ratio of 0.78 meaning that there are 1.28 times more Republican votes there. The point of the lower bar chart is to show that this ratio does not necessarily correspond to the total number of contributions, a somewhat poor proxy for the population. I would have liked to overlay the line upon the bar graph, but I could not find a way in R to have two different y-axes with different scales, one on the left and one on the right. Additionally, it might be better to actually find information about the actual population of these cities to see if city size does correlate with the party ratio. Additionally, it might be interesting to find other city demographics such as median income or median age and compare it to this ratio.

Plot Two

Description Two

This graph shows how contribution dollar amounts changed over time in the five parties. Third party candidates received support later on, while it is interesting to notice the gaps in Republican contributions of lower amount in the final months of the campaign.

Plot Three

Description Three

This graph shows the contributions toward various candidates over time, with the size of the point roughly corresponding to the dollar amount of the contribution. I scaled these using the scale function which normalizes the distribution around a center, which I chose to be one. It can be seen that the majority of Trump’s contributions came towards the end, and that they were larger than contributions towards some of us opponents such as Cruz or Carson. Of the Republican canddiates, Rubio and Bush received the larges contributions. Sanders’s contributions were generally smaller than Clinton’s.


Reflection

This data set gives information on over 350,000 contributions made towards the 2016 presidential election in Florida. I intially explored various categories and discarded some because of redundancy and added a category for the political party of the candidate. Indeed, I used this category more so than candidate to look for trends, since there were 25 different candidates. Most of the trends or correlations I discovered fit with stereotypes regarding occupations and contribution amounts.

I received a lot of practice in subsetting data to exlcude less common factors and also learned the importance of using coord_cartesian() rather than xlim or ylim to choose the graph area of boxplots.

Specifically, I wasn’t sure at first how to create a subset of data based on frequency counts of certain categories. I had to do this since if I created graphs which included all candidates, cities, or occupations, there would be too much information. I learned that the way to subset the data to include only cities with say more than 2000 contirbutions was to create a contigency table and subset based on that.

Additionally, when I first created many of the plots I used “xlim =” or “ylim =” to focus on a part of the graph, without the coord_cartesian wrapper. Doing this actually subsets the data so that when a boxplot is created, the range, median, and IQR do not describe the full data set. While this looks nice because all boxes fit neatly within the plot window, it does not accurately represent the data. In doing so, I inadvertently drew false conclusions about the data. At some point I realized what I was doing and went back and changed the code wherever appropriate.

One other mistake I learned about was naming variables carefully. I named one of my line graphs “median” because it tracked median values over time. Doing this caused all of my other plots which used “FUN = median” in order to order elements on the x-axis to stop working. I now know how important it is to give variables names that are not the same as functions.

Further extension would include creating a model based on these categories to predict which candidate or party a contribution would go towards based on location, occupation, time, and contribution amount, but these variables would have to be converted to numerical values first. To make this information more interesting, I would want to add additional data about the contributors such as age and sex as well as more demographic information about cities to make a more accurate predictive model.